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ABSTRACT 

We present a detailed description of the Voronoi Tessellation (VT) cluster finder algorithm in 2+1 
dimensions, which improves on past implementations of this technique. The need for cluster finder 
algorithms able to produce reliable cluster catalogs up to redshift 1 or beyond and down to lO^'^'^ 
solar masses is paramount especially in light of upcoming surveys aiming at cosmological constraints 
from galaxy cluster number counts. We build the VT in photometric redshift shells and use the 
two-point correlation function of the galaxies in the field to both determine the density threshold for 
detection of cluster candidates and to establish their significance. This allows us to detect clusters in 
a self consistent way without any assumptions about their astrophysical properties. We apply the VT 
to mock catalogs which extend to redshift 1.4 reproducing the ACDM cosmology and the clustering 
properties observed in the SDSS data. An objective estimate of the cluster selection function in terms 
of the completeness and purity as a function of mass and redshift is as important as having a reliable 
cluster finder. We measure these quantities by matching the VT cluster catalog with the mock truth 
table. We show that the VT can produce a cluster catalog with completeness and purity > 80% for 
the redshift range up to ^ 1 and mass range down to ^ lO^'^'^ solar masses. 

Subject headings: Cosmology: observations - Galaxies: clusters: general - Methods: data analysis 



1. INTRODUCTION 

Today we recognize that galaxies constitute a very 
small fraction of the total mass of a cluster, but they are 
nevertheless some of the clearest signposts for detection 
of these massive systems. Furthermore, the extensive ev- 
idence for differential evolution between galaxies in clus- 
ters and the field - and its sensitivity to the underlying 
cosmological model - means that it is imperative to quan- 
tify the galactic content of clusters. Perhaps even more 
importantly, optical detection of galaxy clusters is now 
inexpensive both financially and observationally. Large 
arrays of CCD detectors on moderate telescopes can be 
utilized to perform all-sky surveys with which we can de- 
tect clusters to z ^ 1, and even further with IR mosaics. 

Forthcoming projects such as the Dark Energy 
Survey (DES, darkenergysurvey.org), Pan-STaRRS 
(|paji-starrs . if a.hawaii . edu) and the Large Synoptic 
Survey 'i'elescope (LSST, Isst . org) will map thousands 
of square degrees to very faint limits (^29th magnitude 
per square arcsecond) in at least five filters, allowing the 
detection of clusters through their weak lensing signal 
as well as directly through the visible galaxies. Com- 
bined with ever more efficient cluster-finding algorithms, 
these programs will expand optical cluster detection to 
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redshifts greater than unity. Prospects for utilization of 
such data to address one of the most important scien- 
tific problems of our time by measuring the cosmolog- 
ical parameters with improved precision are outstand- 
ing. In fact, given the statistical power of these sur- 
veys, clusters hav e become one of the strongest probes for 
dark energy ( e.g.,' Haiman et al.' 2001: Holder ct al. 200l|; 
iLevine et a.l] [2002: Hu 2003; Rqzo ct al. 2007, 2010) . 
Two unavoidable challenges imposed by these projects 
are to produce optimal cluster catalogs - with high com- 
pleteness and purity - and to determine their selection 
function as a function of cluster mass and redshift. 

To see how to proceed, we must understand the 
strengths and important limitations of techniques in 
use today, especially with respect to the characteriz- 
ability of the resulting catalogs. We focus on photo- 
metric techniques rather than on cluster finding in red- 
shift space, which also has a long story, starting with 
iHuchra fc Gelleij ([1982'), and has been succesfuUy ap- 
plied to spectroscopic redshift surve y data such a s 2dF- 
GRS (|Eke et al.ll200l and DEEP2 (|Gerke et al.l [20051. 
Although the VT uses redshift information, it is a photo- 
metric technique and this motivates a discussion focused 
on this class of cluster finders. 

The earliest surveys relied on visual inspection of vast 
numbers of photographic plates, usually by a single as- 
tronomer. The true pioneering work in this field did 
not appear until the late fifties, upon th e pub l icatio n 
of a catalog of galaxy clusters produced bv I Abelll (|1958[ ). 
which remained the most cited and utilized resource for 
both galaxy population and cosmological studies with 
clusters for over forty years. lAbell. Corwin. fc OlowirJ 
(|1989l hereafter ACO) published an improved and ex- 
panded catalog, now including the Southern sky. These 
catalogs have been the foundation for many cosmological 
studies over the last decades, even with serious concerns 
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about their reliability. Despite the numerical criteria laid 
out to define clusters in the Abell and AGO catalogs, 
their reliance on the human eye and use of older technol- 
ogy and a single filter led to various biases. These old 
catalogs sufi'ered as much from being black and white as 
they did from being eye-selected. Even more disturbing, 
measures of completeness and contamination in the Abell 
catalog disagree by factors of a few. Unfortunately, some 
of these problems will plague any optically selected clus- 
ter sample, but the use of color information, objective 
selection criteria and a strong statistical understanding 
of the catalog can mitigate their effects. 

Only in the past twenty years has it become possible 
to utilize the objectivity of computational algorithms in 
the search for galaxy clusters. These more modern stud- 
ies required that plates be digitized, so that the data 
are in machine readable form. The hybrid technology of 
digitized plate surveys blossomed into a cottage indus- 
try. The first objective catalog produce d was the Edin- 
burgh /Durham Cluster Catalog (EDCC. fLumsden et al.l 
[1992), which covered 0.5 sr (~ 1,600 square degrees) 
around the South Gal actic Pole (SG P). Later, the APM 
cluster catalog (Dalto n et al.lll997[ ) was created by ap- 
plying Abell-like criteria to select overdensities from the 
galaxy catalogs. The largest, most recent, and the last 
of the photo-digital cluster survey is the Northern Sky 
Optical Survey (N oSOCS: IGal et al.l[2000l [20031 [20091: 
ILopes et al.|[200l . This survey relies on galaxy catalogs 
created from scans of the second generation Palomar Sky 
Survey plates, input to an adaptive kernel galaxy density 
mapping routine. The final catalog covers 11,733 square 
degrees, with nearly 16,000 candidate clusters, extending 
to z ^ 0.3. A supplemental ca talog up to z ^ 0.5 was 
generated bv ILopes et al.l (|2004[ ) using Voronoi Tessella- 
tion and Adaptive Kernel maps. 

With the advent of CCDs, fully digital imaging in 
astronomy became a reality. These detectors provided 
an order-of-magnitude increase in sensitivity, linear re- 
sponse to light, small pixel size, stability, and much easier 
calibration. The main drawback relative to photographic 
plates was (and remains) their small physical size, which 
permits only a small area (of order 15') to be imaged by a 
larger 4096^ pixel detector. Realizing the vast scientific 
potential of such a survey, an international collabora- 
tion embarked on the Sloan Digital Sky Survey (SDSS, 
sdss.org), which included construction of a specialized 
2.5 meter telescope, a camera with a mosaic of 30 CCDs, 
a novel observing strategy, and automated pipelines for 
survey operations and data processing. Main survey op- 
erations were completed in the fall of 2005, with over 
8,000 square degrees of the northern sky image in five 
filters to a depth of r' ^ 22.2 with calibration accurate 
to ~ 1 — 2%, as well as spectroscopy of nearly one million 
objects. 

With such a rich dataset, many groups both internal 
and external to the SDSS collaboration have generated 
a variety of cluster catalogs, from both the photometric 
and the spectroscopic catalogs, using techniques includ- 
ing: 

1. Voronoi TesseUation (|Kim et al.ll2002| ) 

2. Overdensit ies in both spatia l and color space 
(maxBCG, lAnnis et al..,1999; .Koester et al.ll2007bl: 
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3. Subdividing by co lor and making density maps 
(Cut-and-Enhance, iGoto et al.|[2002D 

4. The Matched Filter and its variants (|Kim et al.l 
[200l) 

5. Surface b rightnes s enhancements (iZaritskv et al] 
[19971 [2002: Bartelmann fc Whitell2003) 

6. Overdensities in position and color spaces, includ- 
ing redshifts (C4, iMiUer et al.|[2005l) 

7. Friends-of-Friends (FoF, iBerlind et alll2006[ ) 

Each method generates a different catalog, and early 
attempts to compare them have shown not only that 
they are quite distinct, but also that comparison 
of two photometrically-derived cluster catalogs, even 
from the sam e galaxy catalog, is not straightforward 
(jBahcah et al.l |200^. 

In addition to the SDSS, smaller areas, but to much 
higher rcdshift, have been covered by numerous deep 
CCD imaging surveys. Notable examp les include the 
Palomar Distant Cluster Survey (PD GS. |Postman et al.l 
[l996h . the ESO Imaging Survey fEIS. ILobo et al.ll2000D . 
and many others. None of these surveys provide the 
angular coverage necessary for large-scale structure and 
precision cosmology studies, and have been specifically 
designed to find rich clusters at high rcdshift. The largest 
such s urvey to date is the R ed Sequence Cluster Survey 
(RCS. lGladders fc Yeell2005l) . based on moderately deep 
two-band imaging using the GFII12K mosaic camera on 
the GFHT 3.6m telescope, covers ~ 100 square degrees. 
This area coverage is comparabl e to X-ray surveys d e- 
signed to detect clusters at z ~ 1 (jVikhlinin et al.l l2009') . 

Any cluster survey must make many different math- 
ematical and methodological choices. Regardless of the 
data set and algorithm used, a few simple rules should be 
followed to produce a catalog that is useful for statistical 
studies of galaxy populations and for cosmological tests: 

1. Cluster detection should be performed by an ob- 
jective, automated algorithm to minimize human 
biases. 

2. The algorithm utilized should impose minimal con- 
straints on the physical properties of the clusters, 
to avoid selection biases. Any remaining biases 
must be properly characterized. 

3. The sample selection function must be well- 
understood, in terms of both completeness and pu- 
rity, as a function of both redshift and mass. The 
effects of varying the cluster model on the determi- 
nation of these functions must also be known. 

4. The catalog should provide basic physical prop- 
erties for all the detected clusters, including es- 
timates of their distances and some mass proxy 
(richness, luminosity, overdensity) such that spe- 
cific subsamples can be selected for future study. 

One of the most popular and commonly used meth- 
ods today is the Voronoi Tesselation (VT. ,Ramclla et ahl 
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[200lt iKim et alj [200l iLopes et all [200l . Our imple- 
mentation of this technique is described in detail in 
Jj2] Briefly, it subdivides a spatial distribution into a 
unique set of polygonal cells, one for each object, with 
the cell size inversely proportional to the local density. 
One then defines a galaxy cluster as a high density re- 
gion, composed of small adjacent cells. Voronoi Tesse- 
lation satisfies the above criteria for generating statis- 
tical, objective, cluster samples. It requires no a pri- 
ori assumption on galaxy colors, the presence of a red 
sequence, a specific cluster profile or luminosity func- 
tion. Mock catalogs have been used to test the effi- 
ciency of the detection algorithm. These attractive qual- 
ities have led to its employme nt in numerous projects 
beginning almost 2 yea r s ago (|van de Weygaert & Ickd 
■1989; Ikcuchi & Ti l^ fl99lL l 

rvan dc Wcvgacrt 1993 
Zaninettil 1 995: El- Ad ct al. 1996; Doroshk evich et ail 
T997f K lEbehn g fc Wicdcnmann. (1393) used VT to 
identify X-ra y sources a s over de nsities in X-ray pho - 
ton counts. IKim et all (|200l . iRamella et all (|200l 
and iLopes etaL (12004) looked for galax y clusters us- 
ing VT. Ivan Breukelen fc ClewlevI (|2009l ) included the 
VT as one of two methods in their 2TecX detection 
algorithm, an extension of their w or k on clusters in 
UKID SS (|van Breukelen et al.ll2006l) . iBarkhouse eTall 
(|2006f ) used the VT to d etect clusters on optica l images 
of X-ray Chandra fields. iDiehl fc Statle^ (|2006D applied 
a modified version of the VT algorithm to X-ray data. 

Here we improve on past implementations of this tech- 
nique focusing on optical data. We build the VT in pho- 
tometric redshift shells and use the two-point correlation 
function of the galaxies in the field to determine the den- 
sity threshold for detection of cluster candidates and to 
establish their significance. This allows us to detect clus- 
ters in a self consistent way using a minimum set of free 
parameters and without any assumptions about the as- 
trophysical properties of the clusters. We provide a list 
of member galaxies for each cluster and use the num- 
ber of members as a proxy for mass. We apply the VT 
on mock catalogs that accurately reproduce the ACDM 
cosmology and the clustering properties observed in the 
SDSS data. By comparing the VT cluster catalog with 
the truth table, we measure the completeness and purity 
of our cluster catalog as a function of mass and redshift. 
We show that our implementation of the VT produces a 
reliable cluster catalog up to redshift ^ 1 and down to 
~ 10^^'^ solar masses. 

The paper is organized as follows: ^J2]is dedicated to a 
detailed presentation of the algorithm; ^ describes the 
method used to compute the selection function of the 
cluster catalog; in Sj4] we discuss the completeness and 
purity results and show our ability to recover the mass 
function of the mock catalog at redshift close to unity; 
S|5] presents a summary of this work. The work on the 
relation between the two-point correlation function and 
the VT cell areas distribution - fundamental for the de- 
velopment of our method - is detailed in the Appendix. 

2. ALGORITHM 

We present the VT cluster finder in 2-1-1 dimensions. 
The method is non-parametric and does not smooth the 
data, making the detection independent of the cluster 
shape. It uses all galaxies available, going as far down in 
the luminosity function as the input catalog permits. It 



does not rely on the existence of features such as a unique 
brightest cluster galaxy or a tight ridgeline in the color- 
magnitude space. It works in shells of redshift, treating 
each shell as an independent 2-dimensional field. 

Central to the VT algorithm is the background over 
which an overdensity must rise to be identified as a clus- 
ter. In co ntrast to earlier implementations of the VT 
algor i thm (lEbeling fc Wiedenmann 119931 : IRamella et al.l 
120011: IKim et al.ll2002t ILopes et al.ll2004D , we do not as- 
sume a Poissonian background. We use a more realistic 
assumption that the angular two-point correlation func- 
tion of the backgroui id galaxy distribution is represented 
by a power-law (e.g. iConnollv et al.ll2002D . Another im- 
provement over earlier works on VT-based cluster find- 
ers is the use of photometric redshifts in stead of magni- 
tudes ( Ramella ct al. 2001; Lopes et al.l 12004) or colors 
(jKim et al. 2002). This eliminates the need for a perco- 
lation step and allows for a cluster finder which is not 
based on astrophysical properties of clusters (the lumi- 
nosity function or color-magnitude relation), but on the 
characteristics of the large scale clustering process. This 
makes the VT a cluster finder subject to different sys- 
tematics from color-based methods. 

The fundamental inputs required for cluster detection 
using the VT are the coordinates RA, Dec and red- 
shift of each galaxy and the redshift error crz(z) for the 
full galaxy sample. The input catalog is sliced in non- 
overlapping l-(Tz wide redshift shells. Note that the ve- 
locity dispersion of a typical cluster is much smaller than 
realistic values of az- For each shell an estimate of the 
parameters (^,7) of the two-point correlation function is 
required. This can be obtained directly from the data. 

We then build a Voronoi diagram and compare the 
distribution of cell areas with the distribution expected 
from a background-dominated field. Since small cell size 
implies high density, this allows us to establish a size 
threshold below which the distribution is dominated by 
cluster members. The most significant clumps of contigu- 
ous cells smaller than this threshold are listed as clusters. 
This procedure is repeated on all redshift shells and the 
results are merged into a unique list of cluster candidates. 
The merge proceeds as follows. From the input galaxy 
catalog we extract 3-dimensional boxes centered at the 
coordinates of each candidate. We run the VT on those 
boxes to confirm the detection. This recursive procedure 
eliminates the edge effects at the interface between suc- 
cessive shells, reduces the number of fake detections due 
to projection effects and eliminates multiple detections. 

In the resulting cluster catalog, we report position, red- 
shift, redshift error, galaxy density contrast, significance 
of detection, richness, size and shape parameters of the 
clusters. We also provide a list of members with the local 
density of their respective cells and flags indicating the 
central galaxy (the galaxy found in the highest density 
ceh). 

Although it is possible to build Voronoi diagrams on a 
sphere, we use a rectangular coordinate system, which is 
easier to implement. This implies that we must process 
small sky areas at a time to avoid distortions due to 
tangential projection. We have tested different area sizes 
and concluded that boxes of 3 x 3 degrees are adequate. 
A buffer region is implemented to avoid edge effects and 
the effective area is the central 1x1 square degree box. 
Clusters found in the buffer regions are rejected prior 
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to the merging of the shells' candidate lists. The size 
of the buffer zone corresponds to the angular scale of 
a large cluster at the lowest redshift (a 1 degree scale 
corresponds to ^ 3 Mpc at z = 0.05). 

In the following, we detail each step of the cluster de- 
tection process and explain how each of the above quanti- 
ties are derived, justifying the choices made in designing 
the algorithm. 

2.1. VT construction 

The Voronoi diagram of a 2-dimensional distribution 
of points is a unique, non-arbitrary and non-parametric 
fragmentation of the area into polygons. A simple al- 
gorithm to perform such fragmentation is the following 
(see Fig. [Ij: starting from any position Pi, we label its 
nearest neighbor P2 and walk along the perpendicular 
bisector between those points. We stop when we reach 
for the first time a point Qi equidistant from Pi, P2 and 
any third point P3. We now walk along the perpendicu- 
lar bisector between Pi and P3 until we reach the point 
Q2 and identify the next point P4 by the same criterion. 
Successive repetition of this process will eventually bring 
us back to Qi after a finite number of steps. The set 
of points Qi are the vertices of a polygon, the Voronoi 
cell, associated with Pi. If this process is repeated for 
each point Pi we will have built the Voronoi tessellation 
corresponding to this point field. 




Fig. 1. — A portion of a typical Voronoi tessellation is shown 
together with its dual Delaunay mesh (solid and dashed lines, re- 
spectively) to illustrate the Voronoi diagram building process. For 
each generator set Pi , there is one and only one set of Voronoi cells 
given by the vertices Qi. See text for details. 

However, there are several more robust and efficient 
computational algorithms to build a Voronoi diagram 
from a given distribution. In our code we use the so- 
called Divide & Conquer a,lgorit hm implemented in the 
Triangle library (jShewchukll 19961) . The D&C is based on 
recursive partition and local triangulation of the points 
and then on a merging stage. The total running time, 
for a set of n points is O(nlogn). 



There are no arbitrary choices in building the VT. The 
cell edges are segments of the perpendicular bisectors 
between neighbor points and each vertex is an intersec- 
tion of two bisectors. This implies that the cells will be 
smaller in the high-density regions and since each cell 
contains one and only one point, the inverse of the cell 
area gives the local density. The VT cluster finder takes 
advantage of this fact in the process of detection. 

2.2. Cluster candidate detection 

Each realization of a given point process will result 
in a distinct unique tessellation, but the distribution of 
Voronoi cell areas will be the same. The case of the Pois- 
son point process h as been exte nsively investigated and 
it has been shown ()Kiangj[l96 6!) that the resulting dis- 
tribution of Voronoi cell areas is well fitted by a gamma 
distribution 

p{a:) ^ -^x"~' exp-^^ (1) 

with [3 = a = 4 (only for the Poisson case) and x being 
the cell area normalized by the mean area of all cells. 
Here we extend Kiang's formula to a more general case. 

Consider a random distribution of points in a plane 
with two-point correlation function given by w{0) = 
A9^~'' , where the variable 9 is the separation between 
point pairs and the parameters A and 7 are respectively 
the amplitude and slope of the power-law. The Poisson 
distribution is the particular case where A ^ 0. A gen- 
eral relation between the statistics of the point field and 
the VT areas distribution remains as a conjecture yet to 
be proved, but in the case of a point field generated from 
the above two-point correlation function, the gamma dis- 
tribution still holds with the values of a and /3 modified. 
We have proven this fact and obtained the relation be- 
tween Of, /3 and the parameters A, 7 numerically. Using 
the simulated anneal ing method described in th e context 
of materials science (|Rintoul fc Torauatd[l997l ) we gen- 
erate test fields spanning a wide range of A, 7 pairs. On 
each test field we applied the VT algorithm and obtained 
the corresponding distribution of cell areas, fitting Eq. [1] 
to obtain the corresponding pair a, 13. These two param- 
eters are not independent. They are related by a simple 
relation: (3 = a — 0.26. See the Appendix for a detailed 
discussion of these results. 

Information about the background is given to the VT 
code via the two input parameters A, 7. These will de- 
pend on the redshift shell and, ideally, they should be 
estimated directly from the data being considered. High 
accuracy in the parameters are not required, though. 
Note that no free parameters are introduced by A and 7, 
since they can be completely determined from the global 
input galaxy catalog. Clusters and groups present in 
the field when the two-point correlation function is mea- 
sured do not affect the cluster finder. On the contrary, 
our method is based on the idea that the clustering pro- 
cess resulting in the power-law described by A and 7 also 
results in the formation of clusters, which are found in 
the high density end of the VT cell distribution. 

Taking the differential probability distribution ([T]) as a 
function of the normalized cell density, S = l/x, our goal 
is to identify a density threshold S* above which the con- 
tribution of the clusters starts to dominate over the back- 
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ground. A schematic example is shown in Fig. [5J To the 
background distribution given by A = 0.005 and 7 — 1.7 
(upper panel, dashed line) , we add a cluster contribution 
of 10% given by a simple Gaussian (upper panel, dotted 
line). As a result, the total distribution is distorted by 
the presence of the clusters. To perform the detection, 
we take the corresponding cumulative distributions. For 
the background, the cumulative distribution is given by, 



P{S) 



r(a) 



(2) 



and depends on the input parameters A, 7 through a 
and /3. The maximum of the difference between the 
background (dashed) and the total (solid) distributions 
corresponds to the point where the total distribution in- 
creases faster than the background. This point is a nat- 
ural choice for the threshold 6* (vertical line). 




Fig. 2. — Differential and cumulative distributions of normal- 
ized cell densities illustrating the process of detection in the VT 
cluster finder. The dashed lines correspond to a background distri- 
bution with A = 0.005 and 7 = 1.7. The solid lines correspond to 
the distributions distorted by an artificial Gaussian-shaped cluster 
contribution (dotted line). The vertical line is the threshold for 
detection <5*. All cells above the threshold are selected as cluster 
member candidates. 

In the example above an artificial cluster contribution 
with a particular shape was added to illustrate the prin- 
ciple of detection. In the actual process, we work only 
with the cumulative distributions. Once the threshold is 
computed we select all the cells with 6 > 6*. We then 
take the clumps of contiguous selected cells as cluster 
candidates. 

Setting the threshold at the point of maximum differ- 
ence between the two distributions leads to the detection 
only of the central regions of the most massive clusters 
{M > IO^^'-^Mq). This is a consequence of the fact that 
the two-point correlation function of the field includes 
the contribution of clusters, and only the highest den- 



sity peaks deviate significantly from the distribution pre- 
dicted by Eq. [2l To improve this result, we allow this 
to be an adjustable parameter, called sd. By comparing 
th e two-point correlation f unction of galaxies measured 
bv IDavis k PeeblesI (flOSl in the U.5mB CfA redshift 
survey with the two-point correlatio n function of rich 
R > 1 ) Abell c luster s measured by iBahcall fc Soneiral 
19831) . iBahcalll ()1986[ ) has estimated that ~ 25% of all 
galaxies are associated with clusters and the 10 Mpc 
scale structures that surround them. We therefore set 
our threshold at the point S* where the cumulative dis- 
tribution reaches ~ 75%. As this fraction must change 
with redshift, magnitude limit of the galaxy catalog and 
lower mass limit of the cluster catalog, we determine the 
exact values of the cumulative distribution used to set S* 
in each redshift bin, scl{z), by applying the cluster finder 
on simulated galaxy catalogs and maximizing the com- 
pleteness and purity of the output catalog. This process 
does introduce a free parameter that we must tune. 

2.3. Selection of high- significance candidates and 
membership assignment 

For a given threshold 5*, we assume that each cluster 
candidate has a probability 



p{Smin,Ng) = 1 - Erf 



1 



of being caused by random fiuctuations of the back- 
ground field. Here Smin is the minimum cell density and 
Ng is the number of galaxies in the candidate. Note that 
the process of detection implies Smin > 5* . A confidence 
level of 95% is required for a candidate to be accepted. 
If a given candidate has p{5min, Ng) below this level, we 
iterate on its cells, dropping the one with lowest density 
and recomputing p{5min, Ng), until this candidate falls 
within the acceptable level or runs out of galaxies. As a 
result, some cluster candidates will be reduced in size and 
others will be eliminated. The final list of candidates is 
composed of clusters above the required confidence level. 
This cleaning process is necessary as the 5* thres old is 
set to be permissive; the estimate bv lBahcaUl (|19860 that 
~ 25% of all galaxies are associated with clusters was 
accompanied by a hypothesis that these galaxies were 
distributed in ~ 30 Mpc scale overdense regions about 
clusters, while we aim to detect clusters closer to the 
~ IMpc Virial scale. This process results in a list of 
cluster members, given by all the galaxies within the fi- 
nal VT footprint of the cluster. The galaxy belonging to 
the cell of highest density is taken as the central galaxy. 

The accuracy of the membership assignment is limited 
by the errors in the redshift of the galaxies and width 
of the redshift shell. As discussed in section 12.51 the 
membership list is improved in the second run of the VT 
cluster finder, which is performed in boxes centered at 
the central galaxies fiagged during this first run. 



2.4. Shape measurement 

To obtain the cluster shape parameters, we take the 
galaxies within the cluster VT footprint and compute the 
second moments of the galaxy distribution with respect 
to the coordinates (xc, Uc) of the central galaxy, using the 
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cell densities S as weights. These second moments are: 

I]j 5i{xi - Xc)^ 



Y.i^i{^i - Xc){yi - Vc) 



(4) 



where the x and y directions are aligned with the RA 
and Dec axes, respectively. We use these quantities to 
compute the semi-major and semi-minor axes, a and b, 
respectively: 



{m^X + Vlyy + /) 



-,1/2 



-{m.^x + myy - /) 



1/2 



(5) 



where 



/ = {mxx - myy + Anixy)^^"^ 



The position angle is also obtained in terms of the same 
quantities. 



PA =1^ tan- f^l^ 

TT V TO. 



xy 



(6) 



and is given in degrees. 



2.5. Catalog construction 

A global list of cluster candidates is made by merg- 
ing the results of the individual shells. For each cluster 
in that list we extract from the full input galaxy catalog 
(not the z shells) a 3-dimensional box centered at its cen- 
tral galaxy and with the same size as in the first run: 3x3 
square degrees and cr^ width. These boxes are processed 
with the VT algorithm, repeating the steps described in 
tj2. 1112. 41 and a new global list of cluster candidates is con- 
structed, taking only the clusters found at the center of 
each box. 

We perform a matching between the two global lists. 
In this matching scheme, candidates are considered the 
same cluster if they have more than 50% of shared galax- 
ies and multiple matches are not allowed. When a match- 
ing occurs, that cluster is eliminated from the list of 
candidates available for matching with other candidates. 
The clusters found in the first run but undetected in 
the second run are eliminated as projection effects. The 
primary function of this stage, however, is to deal with 
photo-z slice edge effects. 

Because the new boxes are allowed to cross the initial 
shell boundaries, edge effects in the redshift dimension 
are eliminated. Clusters split in several components dur- 
ing the initial detection will result in cluster candidates 
with a number of shared galaxies after the second run. 
For a given pair of candidates found to be the same clus- 
ter (i.e., sharing more than 50% of their galaxies), only 
the one with the largest number of members is added to 
the final cluster catalog. Otherwise, they are said to be 
distinct clusters with shared galaxies (which are flagged 
in the members list) and both are included in the clus- 
ter catalog. Setting the threshold of shared galaxies to 



50% is a natural choice between the two extremes where 
all candidates would be duplicated or only the clusters 
found with the same set of member galaxies would be 
accepted. 

At this point the detection is completed. We have the 
final list of clusters containing RA, Dec, redshift and a 
list of member galaxies including the parameters of the 
corresponding VT cells. This forms the VT footprint of 
the cluster. The cluster redshift is estimated as the me- 
dian of the redshift of the cluster members. The quantity 
is better estimated in the second run after a cleaner mem- 
bership list is obtained, so as to avoid projection effects 
along the line of sight. 

The output parameters of the VT cluster catalog are: 
ID, RA, Dec (coordinates of its central galaxy or the 
highest density peak), z (given by the median of all mem- 
bers), (rms value), 5c (density contrast measured at 
the final stage of detection), a (significance of detection), 
richness (number of members), size (radius of the cir- 
cle enclosing all galaxies), a (semi-major axis), b (semi- 
minor axis) and PA (position angle). 

We also report a members list containing: ID, host ID 
(most likely host cluster), cell density, shared flag (1 if the 
galaxy is shared with another cluster, otherwise) and 
central flag (1 for central galaxy, for regular members). 
Note that we do not list every possible galaxy-cluster as- 
sociation in the output. Galaxies not associated to any 
cluster are listed with host ID, shared flag and central 
flag set to —1. These non- member galaxies can be used, 
for instance, to compute the local density of non-member 
galaxies around a cluster or to run afterburners to mea- 
sure cluster properties such as richness and i?2oo- 

Having a list of members generated by the cluster 
finder is highly desirable, because properties such as the 
optical richness and i?200 can be estimated. The lack 
of membership assignment in VT implementations us- 
ing magnitudes was a drawback and we improve on that 
matter. Also, this allows us to compute the algorithm 
efficiency as follows. 

3. ALGORITHM EFFICIENCY 

The effectiveness of the algorithm is evaluated by mea- 
suring the VT catalog completeness and purity as a func- 
tion of mass and redshift. These quantities are the se- 
lection function needed to understand the catalog. The 
completeness and purity are best measured with mock 
galaxy catalogs with known relations to dark matter ha- 
los. The field can no longer be advanced by placing single 
clusters in the center of an image with random back- 
grounds. 

We apply the algorithm to a mock galaxy catalog and 
match the resulting cluster catalog with the correspond- 
ing mock truth table of halos - the truth table. This 
allows us to define completeness as the fraction of halos 
with a VT cluster counterpart and purity as the fraction 
of VT clusters with a matching halo. We perform this 
in bins of redshift and we also estimate the impact of 
redshift errors. 

3.1. Mock catalogs 

Mock galaxy catalogs are crea ted using the ADDGALS 
code iB usha fc Wechsleii l2008t |Wcchsler 2004, see also 
iGerdes et al.i i2010'. Appendix A). ADDGALS takes a N- 
body simulation light cone and attaches galaxies to its 
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dark matter particles to create a deep mock photomet- 
ric catalog using an N-body simulation with only mod- 
est mass resolution. The resulting galaxy catalog re- 
produces the luminosity function, the magnitude depen- 
dent 2-point correlation function and the color-density- 
luminosity distribution measured from the SDSS data. 
The mock catalogs used here were based on the Hub- 
ble Volume simulation that modeled a 3-Gpc/h box with 
1024^ particle s in a flat ACDM co smology with i^M = 0.3 
and CT8 0.9 (lEvrard et al.ll2002l) . 

ADDGALS first builds a list of galaxies r-band lu- 
minosities drawing from a luminosity function (f>(Mr), 
and assigns these galaxies to individual dark matter 
particles in the simulation. Here, (j){Mr) is the ob- 
served SDSS r-band luminosity function at redshift ~ 0.1 
from Blanton ct al. (2003) assuming passive evolution of 
1.3 magnitudes per unit redshift. These galaxies are 
then mapped to individual dark matter particles using 
a probability relation P{Rs\Lr/L^) that relates to local 
dark matter overdensity to the luminosity of a galaxy. 
Overdensities of dark matter are computed using the 
characteristic radius Rs, defined as the radius enclosing 
1.8 X 10^^h~^ solar masses of dark matter. The form of 
P{Rs\Lr/ L^) is taken to be a Gaussian plus a log-normal 
representing galaxies in the "field", i.e., unresolved low- 
mass halos, and those in higher mass, well-resolved "ha- 
los." The exact form of this relation is 

^ ' ^ ^ ^ i?\/27r(Tc(ir/i*) 

^-(\n{Rsl)-t^a(L^/L,)f/2a,{L^/L,f ^ 



(7) 



The exact values of the parameters for this function are 
determined using a Monte Carlo Markov Chain analysis, 
imposing that the observed magnitude dependent 2-point 
correlation function is matched. 

The next step is to assign galaxy colors. The local 
galaxy density is computed for each galaxy in the simula- 
tion and in a training set of galaxies from the magnitude- 
limited SDSS DR6 catalog using the projected distance 
to the 5th ne arest neighbor in a bin of redshift as in 
iCooper et al.l |2007). Each mock galaxy is assigned the 
SED of a randomly selected SDSS galaxy with similar 
local galaxy density and absolute magnitude Mr- When 
doing this matching, we don't match absolute measure- 
ments of the densities, but instead opt for a relative 
matching where the SEDs from the densest galaxies in 
our training set are matched to the densest galaxies in 
the mock. This lets up more robustly assign SEDs to 
higher redshift objects where our training set is incom- 
plete. The SED is then k-corrected and the appropri- 
ate filters are applied to obtain SDSS colors. At high 
redshift, color information is extrapolated from low red- 
shifts: r-band magnitudes are passively evolved before 
selecting the SED from our training-set galaxy which 
is then fc-corrected assuming that the rest-frame colors 
and the color-density-luminosity distribution remain un- 
changed. 

The resulting catalog reproduces the overall photo- 
metric and clustering properties of the SDSS galaxies 
at low redshifts (z ~ 0.3) and extends, using simplified 



assumptions, to higher redshifts {z ~ 1.3) and deeper 
magnitudes (r ^ 24). The brightest cluster galaxies 
(BCGs), however are an exception. BCGs luminosities 
are tightly correlated with their host halo mass and are 
not reproduced by this method. Therefore, a BCG lu- 
minosity is calculated for each resolved halo (of mass 
~ 5 X 10^^/t~-^Mf7) a nd ab ove) using the measurements 
from iHansen et al.l (120051 ) before the usual galaxy-to- 
dark matter particle assignment begins. The correspond- 
ing galaxies are then removed from the initial list of 
galaxies and placed at the center of its host halo. 

We run our cluster finder on the mock catalog and 
compare our results with the truth table. The quanti- 
ties featured in the truth table are R.A., Dec, redshift 
and Af200 7 plus list of member galaxies of each halo. In 
this paper, we refer to the truth table as the halo cat- 
alog, and to the VT output as the cluster catalog. The 
quantities we use as inputs are: R.A., Dec, and photo- 
metric redshift. We generate photometric redshifts from 
the true redshifts, using a Gaussian distribution of width 
0-2(1 + z). We test four different values of (Tz, namely 
0.015, 0.03, 0.045 and 0.06 to access the impact of the 
photometric redshift errors in our cluster finder. 

The discussion so far was restricted to a perfect volume 
limited galaxy catalog. A real galaxy catalog, however, 
will have an irreducible level of contamination and in- 
completeness. Here we mimic the effects of these two 
quantities in the mocks by assuming that the input 
galaxy catalog has a completeness function given by a 
Fermi-Dirac distribution 



Cg{r) 



/o 



1 -I- exp((r — fi)/cr) 



(8) 



where fj, is the magnitude limit of the catalog, /o is a nor- 
malization constant and the parameter a controls how 
fast the completeness falls when the magnitude limit is 
reached. The parameters /o and a are taken from pro- 
cessing of the SPSS da ta with the 2DPH0T package 
(|La Barbera et al.ll2008D . We found that /o = 0.99 and 
(7 = 0.2 are typical values. We degrade the mock catalogs 
using /i = 23.5, interpreting Cg{r) as the probability that 
a galaxy of magnitude r is detected. Similarly, from the 
SDSS data we infer that a small fraction of contaminants, 
due to misclassified stars, can be present in the input cat- 
alog. The fraction of misclassified objects increases expo- 
nentially for magnitudes above /j, — 1.5. We take this fact 
into account by generating false galaxies randomly above 
this limit and drawing from ([SJ the probability that this 
object is actually added to the catalog. 

3.2. Membership matching 

The evaluation of completeness and purity requires a 
well defined matching scheme between the cluster cat- 
alog and the truth table. We use a membership-based 
matching method. Membership matching has been used 
in evaluating completeness and purity of b oth photomet- 
ric and spectroscopic catalogs (White & Kochanck 200^; 
lEke et al. 2004; Gerke et al. 2005; Kocstcr ct al. 2007a). 
Unlike cylindrical matching, which has been largely em- 
ployed in this kind of study, this method is parameter- 
free, unambiguous and provides the means to evaluate 
the efficiency of the cluster finder as a function of halo 
mass regardless of the observable proxy for mass. This 
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allow us to distinguish the aspects relevant to the clus- 
ter finding problem from aspects connected to the mass- 
observable proxy calibration, which is a problem per se 
and is better addressed by a separate set of post-finding 
algorithms. 

The inputs for the matching code are the halo catalog 
and the cluster catalog. The first is ranked by mass while 
the latter is ranked by the number of galaxies, both in 
descending order and in bins of redshift. It is critical to 
do the ranking in bins of redshift for both the halos and 
the clusters. In the case of halos, the mass function is 
evolving, so the masses will be changing at fixed rank. 
In the case of the clusters, the fiux limit forces a chang- 
ing luminosity limit with redshift, so the ranks will be 
changing at fixed mass. If this is not taken into account, 
a massive cluster at high-z (2 ~ 1) will get a much lower 
rank than a massive cluster at low-z (z ^ 0.1). 

After ranking, the first step is to fit a rank-mass rela- 
tion R[M) to the cluster catalog, provided rank, and the 
matched halo catalog provided mass. We use the fitting 
formula 



RiM) = — exp exp ' 



V Me 



^1 



This relation has no motivation other than a global fit- 
ting function, valid at all redshifts provided that the 
ranking is performed as described above. For our mock 
catalogs, the best fit parameters for this fitting function 
areMp = 2.26xl0i^, Me = 1.40x10", Mq = 1.85x101^, 
Ml = 1.85 X 10" and a = -1.15. We then invert the 
relation above to compute an "observed mass" for each 
cluster and proceed to the matching. If the proxy used to 
rank the clusters has a tight correlation with mass, the 
ranking will be accurate and the observed mass will show 
a tight correlation with the true mass for the matched 
pairs. It is important to notice, that the use of rank- 
ing instead of observed mass, does not require the mass- 
observable relation to be calibrated. Moreover, neither 
mass information nor the ranking is used in the matching 
process, which is membership-based. 

A match takes place if a fraction of member galax- 
ies is shared by a halo-cluster pair. The best match is 
the object sharing the largest fraction of galaxies. We 
require unique matching, in which a given halo/cluster 
is not allowed to be associated with more than one clus- 
ter/halo. As both lists are ranked by number of galaxies, 
uniqueness is imposed by eliminating a matched object 
from the list of available objects for future matches down 
the list. We also require two-way matching, where the 
best matching pair is found when the matching is per- 
formed in both directions, halos-to-clusters and clusters- 
to-halos. 

We note that this approach to cluster-halo matching is 
quite general and can be applied to any cluster-finding 
algorithm that produces a list of cluster members. It will 
be developed in more detail as a framework for compar- 
ing different algori thms establishing their usefulness for 
cosmological tests (|Gerke et al.ll201l[ ). 

3.3. Completeness & Purity 

Completeness is defined as the fraction of halos hav- 
ing a counterpart in the cluster catalog. Purity in turn 
is defined as the fraction of objects in the cluster cata- 



log that correspond to a true halo. In both cases, only 
unique two-way matches are considered. Allowing for 
non-unique matching, where each cluster may have more 
than one matching halo and vice-versa, would be a more 
permissive approach. For instance, purity would not be 
affected by a halo being split in two components and com- 
pleteness would not be affected by two halos appearing 
as a single cluster. 

We count the number of matched objects in bins of 
mass and redshift. Therefore, 



C(M, z) = 
P{M,z)^ 



matched 



(M,z) 



iVhalos(M, Z) 
Nmatched{M,z) 



clusters 



(M,z) 



(10) 
(11) 



Note that C{M, z) can be computed using the true mass 
of the halos, being totally independent of the mass proxy 
used to rank the clusters. The true mass of the clus- 
ters, however, is available only for the matched objects. 
Therefore P{M, z) has to be computed using the ob- 
served mass and does depend on the ranking. We fit a 
power law to the Mobs — -Mtrue relation from the matched 
objects and use it to transform the scale in the P{M, z) 
plots and show both completeness and purity as a func- 
tion of .Aftruc- This cannot be performed before the rank- 
mass relation fitting step, which is part of the matching 
process. This method allow us to evaluate the efficiency 
of any cluster finder imposing minimum requirements, 
namely a list of members for each cluster. The selection 
function can be defined in terms of completeness and pu- 
rity as 

/(M,z) = ^i^. (12) 
■'^ ' ' P{M,z) ^ ' 

This is a simplified definition. For cosmological studies 
with real data, /(M, z) should be defined and evaluated 
in a likelihood analysis that includes the scatter in the 
mass-observable relation after calibration. Here, how- 
ever, we simply want to compare the observed cluster 
number counts NobsiM, z) to the predictions from the 
ACDM cosmological model NACBuiM, z). In this case, 
the selection function is easily taken into account: 



NobsiM, z) - /(M,z)iVACDM(M,z) 



(13) 



This comparison allows us to develop a feel for how well 
we can recover the true cluster number counts using the 
VT catalog and our ability to perform a cosmological test 
using VT clusters as a probe. 

The method described above is very simplified with 
respect to the procedures involved in an actual measure- 
ment of the mass function. This would require a mea- 
surement of the mass-observable relation and its scatter. 
We do not perform this because the VT cluster cata- 
log provides only Ngals, the number of galaxies on the 
membership list, as a mass proxy. This Ngals was not 
optimized to have a tight relation with mass, as for ex- 
ample the A estimator of ,Rozo et al. (2009). Measuring 
and optimizing a mass proxy is a necessary step if the 
VT is to be used in performing cosmological tests. But 
this problem is better addressed by a separate algorithm, 
specifically designed to provide a calibrated mass proxy 
including the mean relation and the scatter. 
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4. RESULTS AND DISCUSSION 

In Fig. ini we show the completeness and purity as a 
function of mass and redshift for different Gaussian Oz 
values. The photometric redshift errors have a strong im- 
pact on both completeness and purity. For = 0.015, 
completeness lies above 80% for all redshift bins and 
masses above ~ IQ^^-^Mq. Purity however, drops sig- 
nificantly at the low mass end. We attribute this to the 
fact that the range IQ^^-^ — IQ'^^Mq is at the lower bound- 
ary of the halo catalogs associated with the mock catalog. 
ADDGALS will populate some fraction of real dark mat- 
ter clumps in the simulation even if they are below the 
threshold for detection in the halo catalog. A fraction of 
these halos were populated with galaxies by ADDGALS, 
but were not listed in the truth table. We have no means 
to determine the exact fraction at this point and there- 
fore we interpret the purity curve as a lower limit. 

In the high-redshift regime, completeness and purity 
do not change much with cr^. The lowest redshift bin, 
however, shows the lowest purity and completeness in 
almost all cases. This might be due to the large angular 
size of clusters at low-z, as at z ^ 0.1 the target area 
of 1 square degree corresponds to only a few times the 
typical i?200- However, even in this case the VT catalog 
achieves completeness and purity above ~ 80% at all 
masses. Since we are most interested in a reliable catalog 
at high redshifts, we consider the cluster finder efficiency, 
as shown in Fig. |31 very good. 

Note that the behavior of purity is qualitatively differ- 
ent in the last panel, cr^ = 0.060. This may be connected 
to low redshift clusters leaking to high redshift shells at 
higher rates than the high redshift ones fall towards low 
redshift. 

Testing the effect of changes in the cluster finder free 
parameters on the completeness and purity functions, we 
find that: 

1. Changing the fraction of shared galaxies required 
to consider two candidates as the same cluster in 
the range 40 — 60 percent has less than 1% impact 
on the results. We fix this value at 50%. 

2. The selection function is very sensitive to scl{z). 
Setting sd{z) too high (> 0.97) leads to fragmenta- 
tion of clusters, which affects purity at all masses, 
and failure to detect low contrast clusters, which 
affects completeness at the low mass end. Setting 
scl{z) below 0.75 causes merging of clusters and 
affects completeness. An optimal value for scl(z) 
in the range 0.75 — 0.97 has to be found at each 
redshift bin. 

3. The confidence level threshold has little effect on 
the detection. The final list of clusters shows less 
than 10% difference when this parameter varies in 
the range 90 — 99.5 percent. But it affects the se- 
lection function by modifying the membership list. 

Fig. S] illustrates our ability to recover the true cluster 
number counts of the input catalog. We take the case 
(7^ = 0.015(1 -Fz) and the redshift bin 0.9 < z < 1.1. For 
a given mass bin Mi we divide the number of VT clus- 
ters detected by the selection function term f{Mi, z). We 
then sum the corrected counts through all bins of mass 
> M (red solid line). The curve for the truth table is 



done by counting all the halos above M (black dotted 
line). We finally plot (blue dashed line) the values ex- 
pected in a ACDM cosmology fe.g.. lEvrard et alll2002D 
for comparison. 

There is a remarkable agreement between the three 
curves. The tilt of the measured curve with respect to the 
truth table may be interpreted as low mass clusters being 
misplaced towards more massive bins, due to our neglect 
of the scatter in the mass-observable relation. As pointed 
out in ^ 33.31 the method used here does not take into 
account crucial steps involved in an actual measurement 
of the mass function. This issue must be addressed with a 
full program of mass calibration and is beyond the scope 
of this paper. The result shown in Fig. |4] encourages 
the pursuit of such a program, though. Our results show 
that the VT is a reliable cluster finder in the redshift and 
mass range of interest, as seen in the completeness and 
purity curves. Application of this algorithm on SDSS 
data is underway and will be pre sented in a forthcoming 
paper ()Soares-Santos et aLll2010[) . 

5. SUMMARY 

In this paper we present an improved implementation 
of the Voronoi Tessellation cluster finder. Improvements 
with respect to earlier works include: 

1 . The use of photometric redshifts instead of magni- 
tudes. 

2. A more realistic assumption that galaxy fields 
have two-point correlation function described by a 
power-law, and not by a Poisson distribution. 

3. Implementation of a membership assignment 
scheme. 

The VT cluster finder in 2+1 dimensions was tailored 
to fulfill the requirements of upcoming cosmological ex- 
periments aiming at using clusters as probes for dark en- 
ergy. The main challenges towards this goal include the 
construction of reliable cluster catalogs up to high red- 
shifts (z ^ 1) and down to low mass limits (~ IQ^^-^Mq) 
and the measurement of the selection function as a func- 
tion of M and z. To achieve these goals using the VT 
we: 

1 . Adapted the VT algorithm to use photometric red- 
shift shells and take advantage of the relation that 
we have discovered between the two-point correla- 
tion function of the galaxy field and its distribution 
of VT cell areas. 

2. Defined the selection function in term of complete- 
ness and purity, establishing an objective way to 
measure these quantities using simulated catalogs. 

3. Applied the VT to mock galaxy catalogs and com- 
puted the completeness and purity of the output 
cluster catalog with the truth table, showing that 
the VT can produce cluster catalogs with complete- 
ness and purity above 80% in the ranges of interest 
within the M-z parameter space. 

4. Computed the cluster abundance from the VT cat- 
alog and compared it to the halo abundance in the 
mocks, finding a remarkable agreement at all mass 
bins. 
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Fig. 3. — Completeness (left) and purity (right) curves as a function of mass for six redshift bins: 0.1 < z < 0.3 (blue), 0.3 < z < 0.4 
(cyan), 0.4 < z < 0.6 (black), 0.6 < z < 0.7 (orange), 0.7 < z < 0.9 (purple), 0.9 < z < 1.1 (red). From top to bottom, the plot pairs 
feature different values: 0.015, 0.03, 0.045, 0.06. The photometric redshift errors have a strong impact on both completeness and purity. 
In the best case, completeness and purity rest above 80% for all redshift bins and masses above ~ lO^^ ^. In the case of purity, this curve 
should be interpreted as a lower limit (see text for discussion). 



These results allow us to be confident in our ability to 
perform a cosmological test for dark energy using the VT 
algorithm on a data set of sufficient scope. Analysis of 
the application of the VT to the SDSS data is underway 
and will be presented elsewhere. 
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APPENDIX 

THE VORONOI TESSELLATION CELL AREAS DISTRIBUTION FOR POWER-LAW CORRELATED POINT PROCESSES 

Motivated by what is known about the two-point correlation function of galaxies in the Universe, we consider a 
2-diniensional point field characterized by a two-point correlation function of the form 

w{9) = AO^-^ (Al) 

where is a distance, A is the amplitude of the correlation and 7 is the slope of the power-law. ^ = represents the 
Poisson particular case. We generate simmulated fields spanning a wide range of the parameter space (A, 7) around 
the measured values reported in the literature. These simulated fields are used to characterize the VT cell areas 
distribution. 

Although aimed at application in our cluster finder algorithm, this study allows to investigate the connection between 
this VT property and the statistical process of the generator set of points. This topic has been extensively discussed 
(see lOkabi ()200(]f ) for a review). For the Poisson case, simulations have been used to support the so-called Kiang's 
conjecture that the distribution of standardized cell sizes (size/mean size) in n-dimensional space is given by 

p{x) = :f^2:"-l Qy-^-fSx (A2) 



with a = P = 2n. This has been rigorously shown for n = 1 and studied in simulations up to n = 3. Here we extend 
this conjecture to the case where the two-point correlation function of the field is given by a power-law. We focus on 



12 



Soares-Santos et al. 



n = 2. Our results indicate that Ea. lA2l still holds but the parameters a and /3 are modified. The relation a — 0.26 + /3 
is found to be valid within the parameters space explored. 

In the following sections we describe the simulations and the modeling of the area distribution. We discuss our 
results in comparison to the well-studied Poisson case and provide the relevant quantities in Table 1. 

Point field simulation 

To generate the simulated fields with two-point corre l ation function given by Eq. IA1[ we implement the simulated 
annealing method as proposed by iRintoul &: Torguatol (|1997[ ). This method is generally used to find the state of 
minimum "energy" of a given system, by sampling the different states weighted by the probability of occurrence of 
that state. Here, we take Eq. (|Aip as our "reference" state, and the state of the "system" is denoted as Ws{9). We 
consider logarithmic bins in 0, and define the energy of the system as 

(A3) 

i 

where the sum is over all bins. We use 10 bins in the interval 0.01 < 0i < 2. This definition of energy is convenient 
because it ensures that E decreases when the difference between any two bins decreases. 

The initial state is a Poisson state. To evolve the system towards w{6), we chose a particle and move it to a random 
position in the field. We compute the energy E' of this new configuration and obtain AE ~ E' — E. The move is 
accepted with probability 

P^^^^ - I cM-^E/kT) AE > (^4) 

where kT is the "temperature" of the system. This is chosen to allow the system to evolve as quickly as possible to the 
minimum state, without getting trapped in local minima. The initial temperature is set to 1. We attempt to move all 
the particles sequentially and, after a complete round over all the N particles of the system, its temperature is cooled 
by a factor of 2. The system converges about 30% faster with this cooling schedule. 

In Fig. [5] we show one example, where A = 0.005 and 7 = 1.7. This combin ation of parameters c orrespond to typical 
values measured, for instance, on SDSS data up to magnitude limit r' — 21.5 (iConnolly et al.l[2002| ). The initial system 
is on the left, the field in the middle is the final state, after 10 rounds over all particles. The plot on the right shows 
the evolution of the energy of the system. The difference between the initial and final states is not noticeable by eye 
and a statistical method must be used to actually measu re the two-poin t correlation function and compute AE at 
each iteration. We use a fast Fourier transform code (Sza pudi et al.|[2005l) to accomplish this. Using this method we 
have generated 190 fields of 3 x 3 sq degrees and 1.6 x 10* particles. 

1.0 



0.8 
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0.4 
0.2 



00 

00 02 04 06 08 1.0 00 02 04 06 OS 1.0 -1.0 -0.5 0.0 05 1.0 

RA RA Log(jii,/N) 



Fig. 5. — The left plot shows the initial (Poisson) state of a system meant to evolve towards a configuration with A = 0.005 and 7 = 1.7. 
The final state is the one in the central plot. The right plot is the evolution of the energy of the system (normalized by its initial energy) as 
a function of the iteration number normalized by the total number of particles in the system. Under this normalization, na/N = 1,2,3... 
refers to complete rounds over all particles in the field. This simulation was performed in a box of 3 X 3 sq deg containing 1.6 X 10* particles. 
Just a 1 X 1 sq deg portion of the field is shown. 



Gamma model for the VT cell distribution 

We apply the VT code on ea ch o f the simulated fields, obtain the distribution of cell normalized cell areas and find 
the best fit Gamma model fEg. IA2[) . Fig.lHlshows, as an example the VT diagram for the same system featured above. 
The left and right diagrams correspond to the initial and final state of the system, respectively. 

The result of the fit is shown in Fig. [71 again for the case A = 0.005 and 7 — 1.7. For comparison we show as well 
the traditional Kiang formula (dashed line). The results are a = 3.89 ± 0.04 and (3 — 3.65 ± 0.05. Kiang's formula is 
more than 5a away from the best fit. 
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Fig. 6. — The Voronoi diagram corresponding to the two fields shown in Fig. O The initial and final states are on the left and right 
panels, respectively. 
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Fig. 7. — Left: Best fit model for the distribution of normalized VT cell areas featured in Fig. |6l The curve for the Poisson case is also 
shown for comparison (dashed line). Right: Fractional residuals of the fit. 

The results for the ensemble of simulated fields studied are shown in Fig. |S1 The values of a and /3 fall in the range 
3.5 < a < 3.9 and 3.5 < (3 < 3.8. The mean error in both is 0.04. There is a noticeable correlation between these 
two parameters. The difference a — (3 is shown to be 0.26 ± 0.02 all over the parameter space explored. The model 
parameters for the values of A and 7 considered are presented in Table 1. 






5 -1.0 



Fig. 8. — Density maps showing the results of the fit in the parameter space investigated. There is a noticeable correlation between the 
two left most maps. The difference between these two maps is shown in the right. 
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TABLE 1 
VT Model Parameters 
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